Hamiltonian model for multidimensional epistasis 
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We propose and solve a Hamiltonian model for multidimensional epistastatic interactions between 
beneficial mutations. The model is able to give rise either to a phase transition between two 
equilibrium states, without any coexistence, or exhibits a state where hybrid species can coexist, 
with gradual passage from one wild type to another. The transition takes place as a function of the 
tolerance of the environment, which we define as the amount of noise in the system. 
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I. INTRODUCTION 

Evolution takes place via natural selection, whereby 
random mutations which have a salutary effect on the 
fitness (survival probability and/or reproductive capabil- 
ity) of the individual persist in the population and lead 
to new variants; other, neutral mutations may simply be 
carried along since they do not affect the well being of 
the individual. Deleterious mutations usually affect the 
organism adversely, and the accumulation of too many 
will reduce the fitness drastically. Each "species" actu- 
ally consists of a more or less narrow distribution in the 
phase space of all possible genetic states, and this distri- 
bution may shift, in response to environmental pressure, 
along definite evolutionary routes. Q 

An interesting problem is to explain the rather fast 
rates at which populations seem to be able to adapt to 
changing environments, which suggest that the fitness 
does not depend in a simple additive way on the effects 
of each independent mutation, but that there is a non- 
linear relationship, or epistasis 0, between the effects 
of point mutations determining the fitness function. In 
fact, one may surmise that evolution is not a Markovian 
game, but that the fitness depends upon the history of 
the successive mutations, in other words, it is a function 
of the path taken in genomic space. jj| Thus, for a muta- 
tion leading to the development of fingers to be beneficial, 
say, one must already have had a mutation leading to the 
formation of limbs. 

Posed in this way, this problem seems to demand an 
analysis that is intrinsically dynamical. Yet, it actually 
lends itself to a treatment in terms of statistical equilib- 
ria, with the appropriate choice of a fitness function. In 
this paper we aim to provide such a fitness function, and 
solve the resulting model for possibly coexisting equilib- 
rium phases indicating different species. 



II. THE MODEL 

Since Eigen first introduced the quasi-species model ||] 
there has been a huge amount of both analytical and 
numerical work on bitstring models of genetic evolu- 



tion JB], 0], where the genotype of an individual is rep- 
resented by a string of Boolean variables cr,, i — 1, . . . N. 
If one takes the wild type, or the initial genotype, to con- 
sist of a string of O's, each point mutation is indicated by 
flipping the bit respresenting a given gene, from to 1. 
The number of mutations m is then the number of l's on 
the whole string, i.e., m = J2i a i- The fitness is usually 
taken simply to be a function, albeit nonlinear, of to. 

Clearly, each i'th variable can be considered as an in- 
dependent direction in phase space, so that evolution 
takes place in an N dimensional space, where N is the 
length of the genome. The genotype is a vertex on an 
iV-dimensional unit hypercube, and if only single flips 
from to 1 are allowed at a time, the path of evolu- 
tion is a random walk on the edges of this hypercube. 
One possible way in which a vector variable can be in- 
troduced is to consider the whole vector V = {e^} as 
the argument of the fitness function. Since the position 
of each gene on this particular string can be assigned 
with some arbitrariness, one may then demand that the 
fitness is only increased relative to the wild type if the 
bits that flip to 1 occur sequentially. ||. Thus, (0, 0, . . .), 
(1,0,...)(1,1,0...) are in increasing order of fitness while 
(0, 1, . . .) is less fit. 

This demands that we introduce a cost function H 
which depends on the state V, and we have chosen the 
fitness / to decrease exponentially with this cost func- 
tion, viz., 
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where is a measure of how effective the cost function 
is, in affecting the fitness. The fitness function / can 
be identified as the Boltzmann factor in an equilibrium 
statistical model with the Hamiltonian H, at constant 
inverse "temperature" Temperature may be seen 

as the amount of randomness, or disorder in the system, 
competing with the cost function in determining the fit- 
ness. The higher the temperature, or randomness, the 
weaker will be the effect of the cost function in determin- 
ing the state of the system. Therefore we define 



T = (3~ 

as the tolerance in the system. 
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For the cost function we will borrow a Hamiltonian 
introduced by Bakk at al. (tJ, in the context of protein 
folding, where it is of importance that the folding events 
take place in a prescribed order. Thus, 
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h = -xj - c 1 - A ) j n <7i 



(3) 



It can be seen that for A = 0, the only state which is 
favorable is that with all <jj = 1, whereas for < A < 1, 
all states with an uninterrupted initial sequence of Is of 
arbitrary length m lead to improved fitness. Here J is 
a measure of the strength of the interaction between the 
states of each of the sites (alleles), Oi. Clearly, and J 
will always occur together in this model, in the product 
/3J, and we may simply absorb J into the definition of 0. 

The fitness / may be normalized to take values between 
(0, 1) if we devide the expression in Eq.(|l|) by the sum 
over all states V, namely, 



Z = e- m{ ^ }] 



This sum may be performed exactly, to give, 
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Z = 
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+ e [A(iV-l)+l]/3 



(5) 



Using this result we may compute the expectation values 
(average values) of the quantities ip m = Yli=i °») which 
we shall call, ^ m = (ijj m ), for m = 1, . . . , JV. Clearly, <3> m 
is the probability that in equilibrium, at least m initial 
loci on the genotype have switched to 1. One finds, 
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Clearly, as -> 0, * m - (1/2)' 
to define the order parameters 



so that it is convenient 



$m = 



%n - (1/2) T 

1 - (l/2) m 



(7) 



which take values in the interval (0,1). In Fig. 1, we 
present the results of a numerical evaluation of Eq. (g) , for 
A = 0, and in Fig. 2, for A = 1, as a function of x = T/J, 
which is the ratio of the tolerance in the system to the 
strength of the epistatic interactions. 

We find that for A = 0, there is a sharp transition 
for large JV, at x > Xt, below which the genotype is 
completely ordered, with all oi — 1, while for x > x t , the 
whole population is in the the state with all <jj = 0. From 
an inspection of Eqs.^J?]), one sees that x t = (JVln2)~ 1 . 
Thus there are only two possible species in this case, with 
no coexistence between them. However, for JV — > oo, the 



threshold itself goes to zero. (This can be mended if the 
strength of the second term in Eq.(|J) is chosen to be JVJ 
rather than J.) 

For A = 1, it can be seen that the sharp phase transi- 
tion is no longer present (the nonzero value of A has an 
effect similar to turning on a magnetic field in a mag- 
netic phase transition). For large x, all the <E> m decay 
exponentially, as ~ e~ m / x . However, there exist effec- 
tive thresholds x m , for m > 1, below which there is a 
nonzero probability of encountering individuals with m 
initial alleles switched to 1. This signifies that at any 
given x m +i < x < x m , there is coexistence between m 
hybrid species, with the first n < m alleles in the mutated 
state. The probability of encountering an individual with 
n < m seqentially mutated alleles is in fact precisely <i> m . 
We see that ~ for x > xn, with xn ~ l/ln2. 

To further elucidate the meaning of "tolerance," we 
may compute the relative variances v m , where 



v 2 m = <(^ m - * m ) 2 )/vl/ r 2 



It is trivial to note that i/)^ = i/) m , so that 
^ m )/^m- Then it is straightforward enough to get, 

j = l-(e^/2) m 

V,n (ef 3X /2) m + {e^/2) N [2e^( A " 1 ) - - l] 



(8) 
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One may see from here that the behaviour of the sys- 
tem is determined by the critical value of 0X at In 2, and 
moreover, that the variance (or the departure from the 
ordered phase) also depends on whether m is small or of 
the order of JV, as can also be seen clearly from Fig. 2. 



III. CONCLUSION 

In summary, we have presented a Hamiltonian model 
of multidimensional epistasis which weights only certain 
paths in genotype space as being favorable. The model 
has tunable strength (J) of interactions between different 
genes, which can be absorbed into an overall parameter 
(0) which determines how tolerant the environment is 
to deviations from the wildtype, as well as a parameter 
which decides whether coexistence between hybrid indi- 
viduals will or will not be allowed. The model exhibits a 
transition between two pure types as a function of for 
A = 0. For A ^ 0, low tolerances T — 0~ x give rise to 
the appearance of hybrid types, in case a given series of 
mutations increases the fitness. 
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Figure Captions 

Fig. 1. The order parameters <& m for A = all differ 
from zero at the same transition point. Here the length 
of the genome is 100. There are no hybrid species. 

Fig. 2. The order parameters $ m for A = 1, with 
N = 100. There is a set of N effective thresholds, below 
which hybrid species arise, with m sequentially mutated 
alleles. 
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